Skip to content

Conversation

@zy1git
Copy link
Contributor

@zy1git zy1git commented Oct 31, 2025

Summary:
Background
Currently, torchvision.transforms.v2.SanitizeBoundingBoxes fails when used inside a v2.Compose that receives both bounding boxes and a semantic segmentation mask as inputs. The transform attempts to apply a per-box boolean validity mask to all tv_tensors.Mask objects, including semantic masks (shape [H, W]), resulting in a shape mismatch and a crash.
Error Example:
IndexError: The shape of the mask [3] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0
Expected Behavior
The transform should only sanitize masks that have a 1:1 mapping with bounding boxes (e.g., per-instance masks).
Semantic masks (2D, shape [H, W]) should be passed through unchanged.
Task Objectives
Update SanitizeBoundingBoxes Logic:
Detect whether a tv_tensors.Mask is a per-instance mask (shape [N, H, W] or [N, ...] where N == num_boxes) or a semantic mask (shape [H, W]).
Only apply the per-box validity mask to per-instance masks.
Pass through semantic masks unchanged.
If a mask does not match the number of boxes, do not raise an error; instead, pass it through.
Optionally, log a warning if a mask is skipped for sanitization due to shape mismatch.
Clarify Documentation:
Update the docstring for SanitizeBoundingBoxes to explicitly state:
Only per-instance masks are sanitized.
Semantic masks are passed through unchanged.
The transform does not require users to pass masks to labels_getter for them to be sanitized.
Add examples for both use cases (per-instance and semantic masks).
Add/Update Unit Tests:
Test with both per-instance masks and semantic masks in a v2.Compose.
Ensure semantic masks are not sanitized and do not cause errors.
Ensure per-instance masks are sanitized correctly.
This can be added in TestSanitizeBoundingBoxes
Backward Compatibility:
Ensure that the change does not break existing datasets or user code that relies on current behavior.
Finally submit a PR with the changes and link the issue in the description.

Differential Revision: D85840801

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9256

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 8940dc3 with merge base ca22124 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Oct 31, 2025
@meta-codesync
Copy link

meta-codesync bot commented Oct 31, 2025

@zy1git has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85840801.

zy1git pushed a commit to zy1git/vision that referenced this pull request Oct 31, 2025
Summary:

Background
Currently, torchvision.transforms.v2.SanitizeBoundingBoxes fails when used inside a v2.Compose that receives both bounding boxes and a semantic segmentation mask as inputs. The transform attempts to apply a per-box boolean validity mask to all tv_tensors.Mask objects, including semantic masks (shape [H, W]), resulting in a shape mismatch and a crash.
Error Example:
IndexError: The shape of the mask [3] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0
Expected Behavior
The transform should only sanitize masks that have a 1:1 mapping with bounding boxes (e.g., per-instance masks).
Semantic masks (2D, shape [H, W]) should be passed through unchanged.
Task Objectives
Update SanitizeBoundingBoxes Logic:
Detect whether a tv_tensors.Mask is a per-instance mask (shape [N, H, W] or [N, ...] where N == num_boxes) or a semantic mask (shape [H, W]).
Only apply the per-box validity mask to per-instance masks.
Pass through semantic masks unchanged.
If a mask does not match the number of boxes, do not raise an error; instead, pass it through.
Optionally, log a warning if a mask is skipped for sanitization due to shape mismatch.
Clarify Documentation:
Update the docstring for SanitizeBoundingBoxes to explicitly state:
Only per-instance masks are sanitized.
Semantic masks are passed through unchanged.
The transform does not require users to pass masks to labels_getter for them to be sanitized.
Add examples for both use cases (per-instance and semantic masks).
Add/Update Unit Tests:
Test with both per-instance masks and semantic masks in a v2.Compose.
Ensure semantic masks are not sanitized and do not cause errors.
Ensure per-instance masks are sanitized correctly.
This can be added in TestSanitizeBoundingBoxes
Backward Compatibility:
Ensure that the change does not break existing datasets or user code that relies on current behavior.
Finally submit a PR with the changes and link the issue in the description.

Differential Revision: D85840801
zy1git pushed a commit to zy1git/vision that referenced this pull request Nov 4, 2025
Summary:

Background
Currently, torchvision.transforms.v2.SanitizeBoundingBoxes fails when used inside a v2.Compose that receives both bounding boxes and a semantic segmentation mask as inputs. The transform attempts to apply a per-box boolean validity mask to all tv_tensors.Mask objects, including semantic masks (shape [H, W]), resulting in a shape mismatch and a crash.
Error Example:
IndexError: The shape of the mask [3] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0
Expected Behavior
The transform should only sanitize masks that have a 1:1 mapping with bounding boxes (e.g., per-instance masks).
Semantic masks (2D, shape [H, W]) should be passed through unchanged.
Task Objectives
Update SanitizeBoundingBoxes Logic:
Detect whether a tv_tensors.Mask is a per-instance mask (shape [N, H, W] or [N, ...] where N == num_boxes) or a semantic mask (shape [H, W]).
Only apply the per-box validity mask to per-instance masks.
Pass through semantic masks unchanged.
If a mask does not match the number of boxes, do not raise an error; instead, pass it through.
Optionally, log a warning if a mask is skipped for sanitization due to shape mismatch.
Clarify Documentation:
Update the docstring for SanitizeBoundingBoxes to explicitly state:
Only per-instance masks are sanitized.
Semantic masks are passed through unchanged.
The transform does not require users to pass masks to labels_getter for them to be sanitized.
Add/Update Unit Tests:
Test with both per-instance masks and semantic masks in a v2.Compose.
Ensure semantic masks are not sanitized and do not cause errors.
Ensure per-instance masks are sanitized correctly.
This can be added in TestSanitizeBoundingBoxes
Backward Compatibility:
Ensure that the change does not break existing datasets or user code that relies on current behavior.
Finally submit a PR with the changes and link the issue in the description.

Differential Revision: D85840801
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @zy1git , thanks for the PR!

Summary:

Background
Currently, torchvision.transforms.v2.SanitizeBoundingBoxes fails when used inside a v2.Compose that receives both bounding boxes and a semantic segmentation mask as inputs. The transform attempts to apply a per-box boolean validity mask to all tv_tensors.Mask objects, including semantic masks (shape [H, W]), resulting in a shape mismatch and a crash.
Error Example:
IndexError: The shape of the mask [3] at index 0 does not match the shape of the indexed tensor [1080, 1920] at index 0
Expected Behavior
The transform should only sanitize masks that have a 1:1 mapping with bounding boxes (e.g., per-instance masks).
Semantic masks (2D, shape [H, W]) should be passed through unchanged.
Task Objectives
Update SanitizeBoundingBoxes Logic:
Detect whether a tv_tensors.Mask is a per-instance mask (shape [N, H, W] or [N, ...] where N == num_boxes) or a semantic mask (shape [H, W]).
Only apply the per-box validity mask to per-instance masks.
Pass through semantic masks unchanged.
If a mask does not match the number of boxes, do not raise an error; instead, pass it through.
Optionally, log a warning if a mask is skipped for sanitization due to shape mismatch.
Clarify Documentation:
Update the docstring for SanitizeBoundingBoxes to explicitly state:
Only per-instance masks are sanitized.
Semantic masks are passed through unchanged.
The transform does not require users to pass masks to labels_getter for them to be sanitized.
Add/Update Unit Tests:
Test with both per-instance masks and semantic masks in a v2.Compose.
Ensure semantic masks are not sanitized and do not cause errors.
Ensure per-instance masks are sanitized correctly.
This can be added in TestSanitizeBoundingBoxes
Backward Compatibility:
Ensure that the change does not break existing datasets or user code that relies on current behavior.
Finally submit a PR with the changes and link the issue in the description.

Differential Revision: D85840801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants